在本文中,我们旨在估算手头数据的真实分布下的机器学习模型的预测错误。我们将预测模型视为数据驱动的黑框函数,并使用非参数方法量化其统计属性。我们提出了一种新型的抽样技术,该技术利用了数据中嵌入的潜在概率分布信息。提出的方法结合了两个现有的框架来估计预测不准确误差。 $ n $ bootstapping和迭代式hoottrapping中的$ m $。 $ n $ bootstapping的$ m $是维持一致性,并且迭代式引导程序通常用于对预测误差估计的偏置校正。使用Monte-Carlo不确定性量化技术,我们将估算器的总方差分解,以便用户可以就克服可预防错误的措施做出明智的决定。另外,通过相同的蒙特卡洛框架,我们提供了一种通过经验分布而估计偏差的方法。这种偏见捕获了估计器对手动输入数据的敏感性,并有助于理解估计器的鲁棒性。在模型选择案例研究中使用模拟和实际数据集测试了提出的不确定性定量的应用。我们评估了两个框架中提出的估计器的性能;首先,直接应用是作为优化模型找到最佳模型。其次,固定优化引擎并将提出的估计器用作优化器的健身函数。此外,我们将提出的估计量的有限数据集与现有的最新方法比较了渐近统计属性和数值结果。
translated by 谷歌翻译
Before the transition of AVs to urban roads and subsequently unprecedented changes in traffic conditions, evaluation of transportation policies and futuristic road design related to pedestrian crossing behavior is of vital importance. Recent studies analyzed the non-causal impact of various variables on pedestrian waiting time in the presence of AVs. However, we mainly investigate the causal effect of traffic density on pedestrian waiting time. We develop a Double/Debiased Machine Learning (DML) model in which the impact of confounders variable influencing both a policy and an outcome of interest is addressed, resulting in unbiased policy evaluation. Furthermore, we try to analyze the effect of traffic density by developing a copula-based joint model of two main components of pedestrian crossing behavior, pedestrian stress level and waiting time. The copula approach has been widely used in the literature, for addressing self-selection problems, which can be classified as a causality analysis in travel behavior modeling. The results obtained from copula approach and DML are compared based on the effect of traffic density. In DML model structure, the standard error term of density parameter is lower than copula approach and the confidence interval is considerably more reliable. In addition, despite the similar sign of effect, the copula approach estimates the effect of traffic density lower than DML, due to the spurious effect of confounders. In short, the DML model structure can flexibly adjust the impact of confounders by using machine learning algorithms and is more reliable for planning future policies.
translated by 谷歌翻译
Denoising diffusion models hold great promise for generating diverse and realistic human motions. However, existing motion diffusion models largely disregard the laws of physics in the diffusion process and often generate physically-implausible motions with pronounced artifacts such as floating, foot sliding, and ground penetration. This seriously impacts the quality of generated motions and limits their real-world application. To address this issue, we present a novel physics-guided motion diffusion model (PhysDiff), which incorporates physical constraints into the diffusion process. Specifically, we propose a physics-based motion projection module that uses motion imitation in a physics simulator to project the denoised motion of a diffusion step to a physically-plausible motion. The projected motion is further used in the next diffusion step to guide the denoising diffusion process. Intuitively, the use of physics in our model iteratively pulls the motion toward a physically-plausible space. Experiments on large-scale human motion datasets show that our approach achieves state-of-the-art motion quality and improves physical plausibility drastically (>78% for all datasets).
translated by 谷歌翻译
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. Starting from random noise, such text-to-image diffusion models gradually synthesize images in an iterative fashion while conditioning on text prompts. We find that their synthesis behavior qualitatively changes throughout this process: Early in sampling, generation strongly relies on the text prompt to generate text-aligned content, while later, the text conditioning is almost entirely ignored. This suggests that sharing model parameters throughout the entire generation process may not be ideal. Therefore, in contrast to existing works, we propose to train an ensemble of text-to-image diffusion models specialized for different synthesis stages. To maintain training efficiency, we initially train a single model, which is then split into specialized models that are trained for the specific stages of the iterative generation process. Our ensemble of diffusion models, called eDiff-I, results in improved text alignment while maintaining the same inference computation cost and preserving high visual quality, outperforming previous large-scale text-to-image diffusion models on the standard benchmark. In addition, we train our model to exploit a variety of embeddings for conditioning, including the T5 text, CLIP text, and CLIP image embeddings. We show that these different embeddings lead to different behaviors. Notably, the CLIP image embedding allows an intuitive way of transferring the style of a reference image to the target text-to-image output. Lastly, we show a technique that enables eDiff-I's "paint-with-words" capability. A user can select the word in the input text and paint it in a canvas to control the output, which is very handy for crafting the desired image in mind. The project page is available at https://deepimagination.cc/eDiff-I/
translated by 谷歌翻译
3D对象检测是自动驾驶的重要组成部分,深层神经网络(DNNS)已达到此任务的最新性能。但是,深层模型臭名昭著,因为将高置信度得分分配给分布(OOD)输入,即未从训练分布中得出的输入。检测OOD输入是具有挑战性的,对于模型的安全部署至关重要。已经针对分类任务进行了广泛研究OOD检测,但是它尚未对对象检测任务,特别是基于激光雷达的3D对象检测的注意力。在本文中,我们关注基于激光雷达的3D对象检测的OOD输入的检测。我们制定了OOD输入对于对象检测的含义,并提议适应几种OOD检测方法进行对象检测。我们通过提出的特征提取方法来实现这一目标。为了评估OOD检测方法,我们开发了一种简单但有效的技术,用于为给定的对象检测模型生成OOD对象​​。我们基于KITTI数据集的评估表明,不同的OOD检测方法具有检测特定OOD对象​​的偏差。它强调了联合OOD检测方法的重要性以及在这个方向上进行更多研究。
translated by 谷歌翻译
切成薄片的距离(SW)是一种计算有效的,理论上是Wasserstein距离的替代方案。然而,关于切片的分布,其统计特性(超出统一度量)的文献很少。为了为这一研究带来新的贡献,我们利用了Pac-bayesian理论和SW实际取决于切片分布依赖的Gibbs风险的中心观察,而Pac-Bayesian的数量范围已经设计为表征。我们提供四种类型的结果:i)在我们称为自适应切片的距离距离的豆豆泛化范围,即针对任何切片的分布定义的距离,ii)学习切片分布的过程最大歧视性的SW,通过优化我们的Pac-bayesian边界,iii)关于如何通过我们的理论来解释所谓的分布分布切片的距离,以及我们发现的经验例证。
translated by 谷歌翻译
选择性分类是拒绝模型将通过输入空间覆盖范围和模型准确性之间的权衡进行不正确预测的输入的任务。选择性分类的当前方法对模型架构或损耗函数施加约束;这在实践中抑制了它们的用法。与先前的工作相反,我们表明,只能通过研究模型的(离散)训练动力来实现最新的选择性分类性能。我们提出了一个通用框架,该框架对于给定的测试输入,监视指标,该指标与训练过程中获得的中间模型相对于最终预测标签的分歧;然后,我们拒绝在培训后期阶段表现出太多分歧的数据点。特别是,我们实例化了一种方法,该方法可以跟踪何时预测训练期间的标签停止与最终预测标签的意见。我们的实验评估表明,我们的方法在典型的选择性分类基准上实现了最先进的准确性/覆盖范围。
translated by 谷歌翻译
尽管最近在不同的应用程序方案中广泛部署了3D点云分类,但它仍然非常容易受到对抗攻击的影响。面对对抗性攻击,这增加了对3D模型的强大训练的重要性。基于我们对现有对抗性攻击的性能的分析,在输入数据的中和高频组件中发现了更多的对抗性扰动。因此,通过抑制训练阶段的高频含量,改善了针对对抗性示例的模型。实验表明,提出的防御方法降低了对PointNet,PointNet ++和DGCNN模型的六次攻击的成功率。特别是,与最先进的方法相比,Drop100攻击的平均分类精度在Drop100攻击中平均提高3.8%,而Drop200攻击的平均分类精度提高了3.8%。与其他可用方法相比,该方法还提高了原始数据集的模型精度。
translated by 谷歌翻译
在本文中,我们研究了多视图几何中基本和基本矩阵估计的5-和7点问题的数值不太稳定性。在这两种情况下,我们表征了末极估计的条件号是无限的呈现不良世界场景。我们还以给定的图像数据表征不良实例。为了达到这些结果,我们提出了一般的框架,用于分析基于Riemannian歧管的多视图几何体中最小问题的调理。综合性和现实世界数据的实验然后揭示了一个引人注目的结论:在结构 - 从 - 动作(SFM)中的随机样本共识(RANSAC)不仅用于过滤输出异常值,而且RANSAC还选择用于良好的良好的图像数据,足够分离我们的理论预测的不良座位。我们的研究结果表明,在未来的工作中,人们可以试图通过仅测试良好的图像数据来加速和增加Ransac的成功。
translated by 谷歌翻译
我们介绍了棕色行人内径数据集(BPOD),用于在头部安装的行人设置中基准测试视觉内径算法。在布朗大学校园的12个不同室内和户外地点使用同步全球和滚动快门立体声相机捕获此数据集。与现有数据集相比,BPOD包含更多图像模糊和自动旋转,这在行人内径术中很常见,但罕见的其他地方。地面真理轨迹是从沿行人路径放置的粘贴标记产生的,并且使用第三人称视频记录行人的位置。我们在BPOD上评估代表性直接,特征和基于学习的VO方法的性能。我们的结果表明,需要显着的发展来成功捕获行人轨迹。数据集的链接在这里:\ url {https://doi.org/10.26300/c1n7-7p93
translated by 谷歌翻译